A/B Testing Online Store

Alexander Feldman V.1.0

This project explores behavior of clients of the online store.
The goals of the project: testing changes of behavior of users related to the introduction of an improved new interface of the site.

1. Technical descriptions

Current version of technical description

  • Test name: interface_eu_test
  • Groups: А (control), B (tested group)
  • Launch date: 2020-12-07
  • The date when they stopped taking up new users: 2020-12-16
  • End date: 2020-12-30
  • Audience: 14.3%% of the new users from the EU region
  • Purpose of the test: testing changes related to the introduction of an improved new interface of the site.
  • Expected result: within 14 days of signing up, users will show better conversion into product page views (the product_page event), product cart views (product_cart) and purchases (purchase). At each stage of the funnel product_page → product_card → purchase, there will be at least a 10% increase.

2. EDA stage

In [1]:
#!pip install pandas_profiling
#!pip install plotly.express --upgrade
#!pip install plotly --upgrade
In [2]:
# import libraries
import math
import pandas as pd
import numpy as np
import scipy.stats as st
from matplotlib import pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly import graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
from pandas_profiling import ProfileReport
In [3]:
# open the dataset

participants = pd.read_csv('datasets/final_ab_participants_upd.csv') # path for local working
new_users = pd.read_csv('datasets/final_ab_new_users_upd.csv')
events = pd.read_csv('datasets/final_ab_events_us.csv')
In [4]:
display(events.head(), new_users.head(), participants.head())
user_id event_dt event_name details
0 E1BDDCE0DAFA2679 2020-12-07 20:22:03 purchase 99.99
1 7B6452F081F49504 2020-12-07 09:22:53 purchase 9.99
2 9CD9F34546DF254C 2020-12-07 12:59:29 purchase 4.99
3 96F27A054B191457 2020-12-07 04:02:40 purchase 4.99
4 1FD7660FDF94CA1F 2020-12-07 10:15:09 purchase 4.99
user_id first_date region device
0 D72A72121175D8BE 2020-12-07 EU PC
1 F1C668619DFE6E65 2020-12-07 N.America Android
2 2E1BF1D4C37EA01F 2020-12-07 EU PC
3 50734A22C0C63768 2020-12-07 EU iPhone
4 E1BDDCE0DAFA2679 2020-12-07 N.America iPhone
user_id group ab_test
0 D1ABA3E2887B6A73 A recommender_system_test
1 A7A3664BD6242119 A recommender_system_test
2 DABC14FDDFADD29E A recommender_system_test
3 04988C5DF189632E A recommender_system_test
4 4FF2998A348C484F A recommender_system_test
In [5]:
# study datasets with pandas_profiling
events_profile = ProfileReport(events, title='Events Report', explorative=True)
participants_profile = ProfileReport(participants, title='Participants Report', explorative=True)
new_users_profile = ProfileReport(new_users, title='New Users Report', explorative=True)
In [6]:
#events_profile
#participants_profile
#new_users_profile
In [7]:
display(events.info(), participants.info(), new_users.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 423761 entries, 0 to 423760
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   user_id     423761 non-null  object 
 1   event_dt    423761 non-null  object 
 2   event_name  423761 non-null  object 
 3   details     60314 non-null   float64
dtypes: float64(1), object(3)
memory usage: 12.9+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14525 entries, 0 to 14524
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  14525 non-null  object
 1   group    14525 non-null  object
 2   ab_test  14525 non-null  object
dtypes: object(3)
memory usage: 340.6+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58703 entries, 0 to 58702
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     58703 non-null  object
 1   first_date  58703 non-null  object
 2   region      58703 non-null  object
 3   device      58703 non-null  object
dtypes: object(4)
memory usage: 1.8+ MB
None
None
None

We have good quality datasets. There aren't duplicates and missing values (except for the detail column in events, which is understandable).

In [8]:
# change type of data and create new data column.
events['event_dt'] = pd.to_datetime(events['event_dt'])
events['date'] = events['event_dt'].astype('datetime64[D]')
In [9]:
# check dates of events from dataset
print('The period of time is {} days: from {:%Y-%m-%d} to {:%Y-%m-%d}'.format(
    (events['date'].max() - events['date'].min()).days, events['date'].min(), events['date'].max()))
The period of time is 23 days: from 2020-12-07 to 2020-12-30

The test supposes an observing of users events for 14 days since signing up.
Thus, we should take into account the period of registration of users between 2020-12-07 and 2020-12-16.

3. Data analysis

In [10]:
# handle events dataset
events = events.join(new_users.set_index('user_id'), on='user_id')
events['first_date'] = pd.to_datetime(events['first_date'])
events = events[~events['first_date'].isna()]
events['n_days_service'] = (events['date'] - events['first_date']).dt.days
events['in_time'] = events.apply(lambda x: 1 if x['n_days_service'] < 14 else 0, axis=1)
events = events[events['in_time'] == 1] # leave only events which did during 14 days after signing up

It seems to me users of group B (who use new interface) should be in the new_users dataset and the participants dataset as group B of interface_eu_test. Users of control group A (who use old interface) should be in the new_users dataset and the participants dataset as group A of interface_eu_test.

In [11]:
# make A and B groups
B_group_list = participants[(participants['group']=='B')&(participants['ab_test']=='interface_eu_test')]['user_id']
A_group_list = participants[(participants['group']=='A')&(participants['ab_test']=='interface_eu_test')]['user_id']
print('Number of users: Group A - {}, Group B - {}'.format(len(A_group_list),len(B_group_list)))
Number of users: Group A - 5467, Group B - 5383
In [12]:
# Check if there are users who are in both groups.
both_AB_lists = B_group_list[B_group_list.isin(A_group_list)]
print('Number of users who are in both A and B groups: {}'.format(len(both_AB_lists)))
Number of users who are in both A and B groups: 0
In [13]:
# Check if there are users who aren't in new_users.
new_users_vs_B = B_group_list[~B_group_list.isin(new_users['user_id'])]
new_users_vs_A = A_group_list[~A_group_list.isin(new_users['user_id'])]
print('Number of A group users who aren\'t in among new users: {}'.format(len(new_users_vs_A)))
print('Number of B group users who aren\'t in among new users: {}'.format(len(new_users_vs_B)))
Number of A group users who aren't in among new users: 0
Number of B group users who aren't in among new users: 0
In [14]:
# Select only users who from Europe and who signed up before 2020-12-16 for A and B groups.
A_group_list = A_group_list[A_group_list.isin(new_users[(new_users['region']=='EU')&
                                                          (new_users['first_date']<='2020-12-16')]['user_id'])]
B_group_list = B_group_list[B_group_list.isin(new_users[(new_users['region']=='EU')&
                                                          (new_users['first_date']<='2020-12-16')]['user_id'])]
print('Number of users after filtering: Group A - {}, Group B - {}'.format(len(A_group_list),len(B_group_list)))
Number of users after filtering: Group A - 3178, Group B - 3032
In [15]:
# Calculate number of users from EU who participants of the testing

print('Share of EU user who participants of the testing: {:.2%}'
      .format((len(A_group_list)+len(B_group_list)) / len(new_users[new_users['region']=='EU'])))
Share of EU user who participants of the testing: 14.31%
In [16]:
# Study how to distribute of events per users.
B_group_events = events.query('user_id in @B_group_list').groupby('user_id', as_index=False)['event_name'].count()
A_group_events = events.query('user_id in @A_group_list').groupby('user_id', as_index=False)['event_name'].count()
fig = go.Figure()
fig.add_trace(go.Box(y=B_group_events['event_name'], boxpoints="all", name='Group B', 
                     marker={'color':px.colors.qualitative.Set2[1]}))
fig.add_trace(go.Box(y=A_group_events['event_name'], boxpoints="all", name='Group A', 
                     marker={'color':px.colors.qualitative.Set2[2]}))
fig.update_layout(title={'text':'Distribution of events per user'})
fig.show()

The distributions looks the close enaugh. Let's leave the groups as they are.

In [17]:
# Study distribution of events by date
# plot a histograms
fig = go.Figure()
fig.add_trace(go.Histogram(x=events.query('user_id in @A_group_list')['date'], name='Group A',
                          marker={'color':px.colors.qualitative.Set2[3]}))
fig.add_trace(go.Histogram(x=events.query('user_id in @B_group_list')['date'], name='Group B',
                          marker={'color':px.colors.qualitative.Set2[4]}))
fig.update_layout(title={'text':'Distribution of events by date'})
fig.show()

Also, the distribution is without surprises in the point of difference between groups. The peak of events observed at 14 Dec. 25 Dec no events because it's Christmas.

In [18]:
# Try to make funnel for groups.

funnel_table = events.query('user_id in @A_group_list').groupby('event_name', as_index=False).agg({'user_id':'nunique'})\
                                                                           .sort_values('user_id', ascending=False)
funnel_table_B = events.query('user_id in @B_group_list').groupby('event_name').agg({'user_id':'nunique'})
funnel_table = funnel_table.join(funnel_table_B, on='event_name', rsuffix='B')
funnel_table.columns=(['events','group_A', 'group_A'])
funnel_table
Out[18]:
events group_A group_A
0 login 3176 3031
2 product_page 2087 1968
3 purchase 1120 1010
1 product_cart 1033 1017

As we can see the funnel looks weird. The purchase stage has more users than the product cart stage. It's because some users got in the purchase stage from the product page. For to get the right results of conversion let's delete from the purchase stage of users who aren't to cart stage.

In [19]:
user_cart_list = events.query('event_name == "product_cart"')['user_id']
user_abnormal_purchase_list = events[(events['event_name']=='purchase')&(events['user_id']\
                                     .isin(user_cart_list))]['user_id'].unique()
#events_cut = events[events['event_name']!='login'] # drop the login stage
# mark abnormal users
events['abn_users'] = np.where((events['event_name']=='purchase') & 
                                   (events['user_id'].isin(user_abnormal_purchase_list)),
                                  1,0)
events = events[events['abn_users']==0] # drop abnormal users
In [20]:
# create a funnel
funnel_A = events.query('user_id in @A_group_list').groupby('event_name', as_index=False)\
                                                                           .agg({'user_id':'nunique'})\
                                                                           .sort_values('user_id', ascending=False)
funnel_A.columns=(['events','n_users'])
funnel_B = events.query('user_id in @B_group_list').groupby('event_name', as_index=False)\
                                                                           .agg({'user_id':'nunique'})\
                                                                           .sort_values('user_id', ascending=False)
funnel_B.columns=(['events','n_users'])
fig = go.Figure()
fig.add_trace(go.Funnel(
    y = funnel_A['events'], x = funnel_A['n_users'],
    textinfo = 'percent previous + value', name='Group A'))
fig.add_trace(go.Funnel(
    y = funnel_B['events'], x = funnel_B['n_users'],
    textinfo = 'percent previous + value', name='Group B'))
fig.update_layout(title={'text':'Events funnels with share of users from the previous stage'},
                 yaxis=dict(title='Events'), colorway = [px.colors.qualitative.Set2[2],
                                                         px.colors.qualitative.Set2[6]])
fig.show()

As we can see from the funnel chart, conversions of Group B did not live up to expectations. Results for Group A better than for Group. There is only one better result for Group B for the product cart stage. But the difference is only 3%.

In [21]:
# Study a revenue
revenue_A = events.query('user_id in @A_group_list').groupby('user_id')['details'].sum().sum()
revenue_B = events.query('user_id in @B_group_list').groupby('user_id')['details'].sum().sum()

fig = go.Figure([go.Bar(x=['Group A', 'Group B'], y=[revenue_A, revenue_B],
                        text=[revenue_A, revenue_B], textposition='outside', 
                        marker={'color':px.colors.qualitative.Set2[7]})])

#fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.update_layout(yaxis=dict(title='Revenue'), xaxis=dict(title='Groups'),
                 title={'text':'Revenue by group'})
                                                        
fig.show()

The revenue for Group A also higher than for Group B.

4. A/B testing

Compare funnels and conversions on the each stage.

In [22]:
funnel = funnel_A.join(funnel_B.set_index('events'), on='events', rsuffix='b').reset_index(drop=True)
funnel.columns=['stage','A','B']
funnel
Out[22]:
stage A B
0 login 3176 3031
1 product_page 2087 1968
2 product_cart 1033 1017
3 purchase 782 676
In [23]:
# make a function

def check_hypothesis(stage_1, stage_2, alpha):
    
    '''
    The function makes Z-test for A and B groups values. It accepts the names of the stage which conversions test
    between and the alpha level.  
    
    The functian return result of test: conversion rate values for groups, the difference of conversion rates,
                                        p-value and result of tasting (can reject or can't reject the null hypothesis)
    '''
    
    successes1=funnel[funnel['stage']==stage_2]['A'].iloc[0]
    trials1=funnel[funnel['stage']==stage_1]['A'].iloc[0]
    
    successes2=funnel[funnel['stage']==stage_2]['B'].iloc[0]
    trials2=funnel[funnel['stage']==stage_1]['B'].iloc[0]
    
    #proportion for success in the first group
    p1 = successes1/trials1

   #proportion for success in the second group
    p2 = successes2/trials2

    # proportion in a combined dataset
    p_combined = (successes1 + successes2) / (trials1 + trials2)

    difference = p1 - p2
    z_value = difference / math.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
    distr = st.norm(0, 1) 
    p_value = (1 - distr.cdf(abs(z_value))) * 2
    
    print ('The', stage_2, 'conversion: Group A - {:.2%}, Group B - {:.2%}'.format(successes1/trials1,
                                                                                   successes2/trials2))
    print('The difference is: {:.2%}'.format(successes2/trials2-successes1/trials1))
    print ('The alpha level: ', alpha)
    print('p-value: ', p_value)
    if (p_value < alpha):
        print("We reject the null hypothesis. The conversion rate for stages: {} and {} between group A and group B have\
 a significant difference.".format(stage_1, stage_2))
    else:
        print("We can't reject the null hypothesis. The conversion rate for stages: {} and {} between group A and group B\
 are equal.".format(stage_1, stage_2))
   

Test for the product page conversion.

The formulation of hypotheses:

  • Null hypothesis: The product cart conversion rate (share of users who move from the product_page stage to the product_cart stage) between group A and group B are equal.
  • Alternative hypothesis: The product_cart conversion rate between group A and group B have a significant difference.
In [24]:
#Set up the alpha level:
alpha = 0.05
# make test
check_hypothesis('product_page', 'product_cart', alpha)
The product_cart conversion: Group A - 49.50%, Group B - 51.68%
The difference is: 2.18%
The alpha level:  0.05
p-value:  0.165243223204254
We can't reject the null hypothesis. The conversion rate for stages: product_page and product_cart between group A and group B are equal.

As we can see from result of the test, the product_cart conversion didn't change after the lanching of new interface of site.

Test for the purchase conversion.

The formulation of hypotheses:

  • Null hypothesis: The purchase conversion rate (share of users who move from the product_cart stage to the purchase stage) between group A and group B are equal.
  • Alternative hypothesis: The purchase conversion rate between group A and group B have a significant difference.
In [25]:
#Set up the alpha level:
alpha = 0.05
# make test
check_hypothesis('product_cart', 'purchase', alpha)
The purchase conversion: Group A - 75.70%, Group B - 66.47%
The difference is: -9.23%
The alpha level:  0.05
p-value:  3.99906172932063e-06
We reject the null hypothesis. The conversion rate for stages: product_cart and purchase between group A and group B have a significant difference.

As we can see from result of the test. The purchase conversion rate not only did not improve, but significantly deteriorated after the lanching of new interface of site.
But before we delete some users who jumped from product page directly to purchase page. Let's test conversion rate between those stages.

Test for the product page - purchase conversion.

The formulation of hypotheses:

  • Null hypothesis: The product page-purchase conversion rates (share of users who move from the product_page stage to the purchase stage) between group A and group B are equal.
  • Alternative hypothesis: The product page-purchase conversion rates between group A and group B have a significant difference.
In [26]:
# come back deletated users
funnel = funnel_table.query('events!="login"')
funnel.columns=['stage','A','B']
funnel
Out[26]:
stage A B
2 product_page 2087 1968
3 purchase 1120 1010
1 product_cart 1033 1017
In [27]:
#Set up the alpha level:
alpha = 0.05
# make test
check_hypothesis('product_page', 'purchase', alpha)
The purchase conversion: Group A - 53.67%, Group B - 51.32%
The difference is: -2.34%
The alpha level:  0.05
p-value:  0.13513329228006232
We can't reject the null hypothesis. The conversion rate for stages: product_page and purchase between group A and group B are equal.

The product_page - purchase conversion rate also didn't improve.

5. Conclusions

So, we have conducted A/B testing for new European users who have been performing actions on the site for 14 days.
We used two groups of users for the test: the test group B and the control group A. The test group B used the updated site interface during the testing period. The control group used the old interface. We expected the new interface to increase user conversions in the product cart and the purchase stages, and increase the total amount of sales revenue. As a result of analyzing user behavior, we found that group A and group B showed almost the same results. The result of the conversion to cart showed a significant difference (9,2%) in favor of group A. Group A also showed higher income.

Recommendations:

  • Make a survey for users who have used the new interface and ask them about the experience and usability of the interface.
  • Pause to bring the new interface into basic operation.
  • Make the interface improvements and repeat testing in 2 months.
In [ ]: